fix(s2): enforce 1 rps throttling across S2 stages#5
Closed
Conversation
- Guard json.loads() in analysis.py with try/except JSONDecodeError - Add s2_timeout config setting (default 10s) with retry=False for S2 client - Prevent PDF double-download by saving during analysis and marking pdf_downloaded - Skip already-downloaded PDFs in export stage
- Refactor _build_settings to use immutable Settings(**overrides) pattern - Add --overwrite flag to run command - Auto-increment output directory name when directory exists and is populated - Add tests for collision detection and overwrite behavior
- Write ScreeningResult with score=0 for papers without abstract - Wrap call_llm in try/except LLMError in query_gen with clear error message
- Rename litresearch.toml to litresearch.toml.example (git mv) - Add html.unescape() for title, abstract, venue in Paper.from_s2()
- Test query generation with successful LLM response and error handling - Test screening behavior for no-abstract papers and JSON parse failures - Test discovery S2 client configuration and paper deduplication
- Add comment for BATCH_SIZE in enrichment.py - Add run summary block in pipeline.py with timing and counts - Change screening_threshold default from 40 to 60 with documentation
Add configurable s2_requests_per_second (default 1.0) and throttle requests in discovery and enrichment to respect Semantic Scholar rate limits. Update example config and add discovery rate-limit test.
Owner
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
s2_requests_per_secondsetting (default1.0) to match Semantic Scholar key limitslitresearch.toml.examplewith S2 timeout + rate settingsWhy
Your approved Semantic Scholar key is limited to 1 request/second cumulative. This change makes that limit explicit and enforced by default.
Validation
uv run nox -s lint typecheck test